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ABSTRACT 

We measure planet occurrence rates using the planet candidates discovered by the Q1-Q16 Kepler 
pipeline search. This study examines planet occurrence rates for the Kepler GK dwarf target sample 
for planet radii, 0.75<i?p<2.5 R®, and orbital periods, 50<Porb<300 days, with an emphasis on a 
thorough exploration and identification of the most important sources of systematic uncertainties. 
Integrating over this parameter space, we measure an occurrence rate of i’o=0.77 planets per star, 
with an allowed range of 0.3< Fq <1.9. The allowed range takes into account both statistical and 
systematic uncertainties, and values of Fq beyond the allowed range are significantly in disagreement 
with our analysis. We generally find higher planet occurrence rates and a steeper increase in planet 
occurrence rates towards small planets than previous studies of the Kepler GK dwarf sample. Through 
extrapolation, we find that the one year orbital period terrestrial planet occurrence rate, ('i.o=0.1, 
with an allowed range of 0.01< Ci o <2, where Ci.o is defined as the number of planets per star within 
20% of the Rp and Porb of Earth. For G dwarf hosts, the Ci.o parameter space is a subset of the 
larger parameter space, thus Q.o places a lower limit on for G dwarf hosts. From our analysis, 
we identify the leading sources of systematics impacting Kepler occurrence rate determinations as: 
reliability of the planet candidate sample, planet radii, pipeline completeness, and stellar parameters. 
Subject headings: eclipses - methodsistatistical - planetary systems - surveys - space vehicles - cata¬ 
logs - stars:statistics 


1. INTRODUCTION 

The Kepler data set is the only currently available 
experiment capable of detecting and characterizing the 
planetary content of the Milky Way down to the regime 
of ter restrial planets orbiting within I AU of solar-type 
stars l)Borucki et al.l[2?)T(Tl : iKoch et al.ll201(T[l . The ongo¬ 
ing detailed analysis of Kepler data can provide accu¬ 
rate and precise measurement of the occurrence rate of 
Earth analogs beyond the Solar System, which is a crit¬ 
ical parameter for underst anding the potential for life 
outside the Solar System (iDrakg [20131 lPrantzosll20I3[l 
and a quantity of immense interest across the disciplines 
of science, engineering, philosophy, and sociology. 

Kepler builds upon the enormous efforts of the 
astronomical community in hlling out the param¬ 
eter space of planet properties i n nu m erous stellar 
environments (iFischer k, Valenti! 120051 iMarcv et al.l 
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: INaef et al.l 120051: IGould et al.l 120061 ISahu et al. 
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Gumminer et al.l 20081: Bowler et al. 20101: 

Howard et al.ll201ft Bavliss & SackettI 201lHMavor et al. 

2011 

Ivan Saders & Gaiidi 

120111: iWriffht et al. 

2012 

; iBonfils et al.l 12013 

iMeibom et al.l 120131: 

Glanton & Gaudil 120141. Kepler expands the plan- 


etary discovery space to terrestrial planets within I AU 
of solar-type stars. Kepler data enables the study 
and simulations of planetary formation to finally be 
confronted with the ir predictive outc o mes in the r egime 
of rocky planets (llda &: Lin! 12004 iBenz et ^ 12014 
iMordasini et al.ll2015f) . In addition, Kepler constraints 
on the prevalence of rocky planets in the habitable zone 
(HZ) of nearby stars provides a key input in defining the 
scope of future missions that will probe the atmospheres 
of extrasolar planets (i Dressing fc Gharbonneaul 2013; 

Kouveliotou et al.ll20l4Batalhall20l4lLeger et al.l 2015 : 

Stark et al. iMi: 


A substantial shortcoming for planet occurrence rate 
determinations using the Kep l er pipe lin e planet can¬ 


didate samples (6 

Borucki et alJ I2011allbl: IBatalha et al. 

2013 

: IBurke et al.l 

20l4IR.owe et al.ll2015l:IMullallv et al. 

2015 

) has been the unavailability of an accurate model 


tor t he completeness of the Kepler pipeline (iBatalhal 
I20l4) . Previous planet occurrence determinations from 
Kepler data have dealt with this shortcoming by em¬ 
ploying simplifying assumptions as to the pipeline com¬ 
pleteness: assume the theoretical perfo rmance of the 
Transiting Planet Search algorithm ITPS Ijenkinsll20r)^ 
with a signal-to-nois e ratio (SNR) threshold of 7.1 
(iBorucki et al.l l2011bll . designate a higher SNR level 
where the planet sample is close to 100% complete 
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(iCatanzarite fc Sha^ 120111: f^udid I201H [Howard et alJ 


20121: Traud ^121: Dressing fc Charbonneaul 120131 : 

Dong &: Zhull2013l : lSilburt et aUboiSD . or simultaneously 


solve for a parameteri zed completeness model in addition 
to planet occurrence jFressin et alJ 120131 iMulders et alJ 
l2015l : lFarr et al.ll2014ll . Others avoid this shortcoming al¬ 
together through an independent planet search pipeline 
and pi pe line completeness measurement (|Petigura et alJ 
l2013allbl: [Dressing fc Charbonnea^jl2015ll ■ 

lChristiansen et al.l (j2015(l rectify this shortcoming by 
directly measuring the Kepler pipeline completeness o f 
the Q1-Q16 ifepler pipeline run (|Tenenbaum et aO20l4l 
through Monte-Carlo transit injection and recovery tests. 
In thi s study, we make use of the [Christiansen et ^ 
(|2015f) Kepler pipeline completeness parameterization in 
order to derive the planet occurrence rates from the 
resulting Ql-016 K epler planet candidate sample of 
iMullallv et alJ p015fl . Another highlight of this study is 
a comprehensive analysis of the systematic errors present 
in deriving pla net occu r rence rates with Kepler data. A s 
exemplified in lYoudinl (1201111 and iDong fc Zhul (l2(I13ll . 
we undertake a sensitivity analysis where we iteratively 
change an input assumption and recalculate the occur¬ 
rence rates. We investigate the following input assump¬ 
tions: pipeline completeness systematics, orbital eccen¬ 
tricity, stellar parameter systematics, planet parameter 
systematics, and planet sample classification systemat¬ 
ics. 

This paper is organized as follows. Section [2] describes 
the pipeline completeness model that quantifies the sur¬ 
vey completeness for any target observed by Kepler. Sec- 
tions[3]and|T]summarize the stellar properties and planet 
sample from the Q1-Q16 ifepZer pipeline run adopted for 
derivation of the plane t occurr e nce ra tes. We extend the 
analysis techniques of lYoudii] (j2011f l by increasing the 
complexity of the parameterized model for the planet 
occurrence rate and employ Markov Chain Monte-Carlo 
(MCMC) methods for solving the parameter estimation 
problem in Section [5] Section [01 presents results for the 
planet occurrence rate using a baseline set of inputs, and 
we thoroughly explore the systematic errors in this result 
through a sensitivity analysis in Section 16.21 We com¬ 
pare the occurrence rate analysis with previous efforts 
in Section [T] We apply the resulting occurrence rates 
to determine the occurrence rate for terrestrial planets 
with an orbital period equivalent to Venus in Section |9] 
as well as extrapolating these results toward longer peri¬ 
ods (SectionlH) in order to measure a one year terrestrial 
planet occurrence rate in Section [TUI Finally, Section ITT] 
summarizes the future work necessary to improve the ac¬ 
curacy for the resulting planet occurrence rates. 


2. Kepler PIPELINE COMPLETENESS MODEL 

This section details an analytic star-by-star model 
for the Kepler pipeline completeness. A critical 
component for modeling the completeness of Ke¬ 
pler observations is simulating the performance of 
the TPS pipeline module which is responsible for 
characterizing the noise present in a light curve 
and detection of the trans it sign als (j.Tenkin^ 120021 : 
iTenenbaum et al']l2012L I2013L 1201411 . The performance 
of a transit survey can be fully specified with in¬ 
tensive, end-to- end Monte Carlo signal i njection and 
recovery tests (jWeldrake &: Sacked l2005t I Burke et al.l 


20061: iHartman et al.l 120091 : iChristiansen et all 120131 : 
Seader et al.l 1201411 . Unfortunately, due to their numer¬ 
ically intensive nature, Monte-Carlo injection tests are 
not amenable to a systematic sensitivity analysis, and 
the tests are limited to the subset of targets that one 
performs the analysis upon. Therefore, we present a sim¬ 
plified analytic model for the Kepler pipeline that can 
be readily applied to any observed Kepler target using 
a minimum of input data. Fortunately, the joint noise 
characterization, filtering, and detection properties of 
TPS were designed to facilitate a well defined and tested 
detector response for transit signals even in t he pres¬ 
ence of astrophysical broad-band or red noise ([Jenkinsl 
1200211 . Given the well defined properties of the TPS de¬ 
tector, our analytic completeness model can achieve high 
fidelity after it is calibrated with Monte-Carlo injection 
tests. For a single target, we parameterize the pipeline 
completeness over a two-dimensional grid of orbital pe¬ 
riod, Porb, and planet radius, Rp. 

2.1. Multiple Event Statistic Estimation 

Modeling pipeline completeness requires modeling the 
statistical behavior of TPS and its response to noise 
in th e presence of a signal (I Jenkinsl 120021 : iSeader et al.l 
1201311 . In the presence of broad-band red noise, TPS 
considers the so-called Multiple Event Statistic (MES) 
to measure the strength of a potential transit signal. In 
the null hypothesis case of no signal present, the MES 
distribution is Gaussian with an average of zero and unit 
variance. In the alternative hypothesis case for the pres¬ 
ence of a signal, the MES distribution is Gaussian but 
the average MES is shifted proportional to the SNR of 
the transit signal. The first step for modeling pipeline 
completeness is to estimate the expected MES of a tran¬ 
sit signal for a specified Porb and Rp. This requires an 
estimate of the expected transit duration. 



where e is the orbital eccentricity and the stellar ra¬ 
dius, P*, and orbital semi-major axis, a, are in a con¬ 
sistent set of units. In Equation (P), we assume Pp<CP*, 
shorten the transit duration from the central crossing 
time by a factor of 7r/4 for its expectation assuming a 
uniform distribution o f cosi for the orbital inclinatio n 
(| Gilliland et al.l 120001: ISeager fc Mallen-OrnelasI I2003f) . 
and include th e expected d ependence on the transit du¬ 
ration with e (iBurkel I 2 OO 8 II . We explore the sensitivity 
of our results to e > 0 in Section 16.2.21 
Next, we determine the noise present in the light 
curve data averaged over the transit duration of 
interest. TPS estimates the time varying noise 
present in a light curve, the so-called Co mbined Dif¬ 
feren ti al Photometric Precisio n (CDPP, iKoch et al.l 
I 2 OIOI: IChristiansen et ^ l2012h . CDPP varies with 
time and is calculated over the same 14 transit 
durations, Tdur.srch = [1.5, 2.0, 2.5,3.0,3.5,4.5, 5.0, 

6.0, 7.5, 9.0,10.5,12.0,12.5,15.0] hr, that are used 
in the transiting plan et search (see Figure 3 of 
IChristiansen et al1l2012[) . For the analytic completeness 
model, we employ a summary statistic, a robust root- 
mean-square of the CDPP (robCDPP), for the light 
curve noise. In testing it was found that the non-robust 
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root-mean-square CDPP (rmsCDPP) statistic typically 
reported by the Q1-Q16 pipeline data products (SOC 
build 9.2 and earlier) can be biased. The bias in the 
rmsCDPP arises when the distribution formed from 
the CDPP time series values is asymmetric with the 
outlying tail of the distribution inordinately affecting 
the results. 

We calculate robCDPP by replacing the typical compo¬ 
nents of the root-mean-square that includes the mean/dc 
component 

= ( 2 ) 

where x is the mean and a is the standard deviation, 
with their robust equivalents 

2^rob.rms = median(a;)^ -I- (1.4826 x MAD)^, (3) 

where MAD is the median absolute deviation. The 
robCDPP is calculated after removing both invalid and 
deweighted data identified in the same manner as during 
the transit search. In order to remove the influence of 
strong signals in estimating the noise present in a light 
curve, the robCDPP value adopted for the completeness 
model is calculated on the light curve after all the po¬ 
tential transit signatures have been rem oved in the Dat a 
Validation (DV) multiple planet search (|Wu et al.ll201Clll . 
On average, the robCDPP to rmsCDPP ratio is 1.03 with 
a sample standard deviation of 0.06. A higher fidelity 
model of pipeline completeness would employ the full 
CDPP time series. However, in tests we find that when 
the number of expected transits contributing to a detec¬ 
tion, Wrn ^ 5, the robCDPP summary is sufficient to 
model pipeline completeness rather than a more time- 
consuming calculation involving the full CDPP time se¬ 
ries (see Section mH). 

For a given Tdur, we interpolate within the grid of 14 
robCDPP values to estimate the noise for that duration, 
Ccdpp- For values of Tdur outside the Tdur.srch grid, we 
adopt the end point robCDPP for CTcdpp- Alternative ex¬ 
trapolation methods such as assuming a (Tcdpp oc 1/^Tdur 
dependence or linear extrapolation were not stable for 
the sometimes complicated behavior of robCDPP as a 
function of Tdur- 

The next step is to estimate the expected transit sig¬ 
nal depth, A. Since the TPS search algorithm employs 
a square box-car signal template match to the signal 
for detection, the appropriate A is the average signal 
depth over Tdur rather than the purely geometric depth 
S = k'^ = (i?p/i?*)^, where k is the physical radius ra- 
tio. From the res u lts of the transit injection study of 
iChristiansen et id] ()2015[1 . we determine that for limb 
darkened transit signals, on average A = O.SdAmax, 
where Amax is the depth at closest approach or max¬ 
imum transit depth. When averaged over a uniform 
distribution of impact parameter , &, we fi n d usin g the 
limb darkened transit model of ICimenej (j2006[ l that 
Aniax/A^ = c -|- s fc is well fit by a linear relationship 
with parameters c and s that vary with the limb dark¬ 
ening profile of the stellar intensity. When using a linear 
limb darkening law, / = 1 — m(1 — cos9), where I is 
the stellar intensity relative to the stellar center, u is the 
linear coefficient, and 6 is the line-of-sight stellar-surface- 
normal angle, with a coefficient appropriate for G dwarfs, 
u = 0.6, we determine best fit values of (c = 1.0874, 


s = 1.0187). We note that c and s are weakly depen¬ 
dent upon limb darkening, taking on values (c = 1.0696, 
s = 1.001) for u = 0.5 and (c = 1.1068, s = 1.0379) for 
u = 0.7. The needed value of A is expressible in terms 
of k using the previous equations to provide the single 
transit event SNR, A/crcdpp- 

The MES is calculated by averaging the transit sig¬ 
nal strength over multiple transit events. The resulting 

MES='\/Ati-nA/fJcdpp: where A^tm — (Tobs/-Forb) ^ /duty 

is the expected number of transit events. Fobs is the time 
baseline of observational coverage for a target and /duty 
is the observing duty cycle. The observing duty cycle, 
/duty, is defined as the fraction of Fobs with valid obser¬ 
vations. The Kepler spacecraft experiences planned data 
gaps for data download and other spacecraft operations. 
These data gaps result in an overall duty cycle of ~ 95% 
for targets observed for all quarters. In addition, the duty 
cycle is suppressed further in the transit search to ~ 88% 
by a set of data weights applied in TPS. Data near gaps 
suffers from spacecraft systematics, thus TPS deweights 
data near gaps using a smooth exponential decay func¬ 
tional form that goes from fully deweighted data at the 
gap edge to full data weighting over a span of 2 days. We 
calculate /duty by dividing the number of cadences with 
an overall deweighting factor > 0.5 by the total number 
of cadences within Fobs- In addition, we force a floor 
of Ntvn > 3 in the MES estimate since TPS requires at 
least 3 transit events for detection. Fobs and /duty are 
calculated in the initial call to TPS before transit signals 
have been identified and removed by DV. 

2.2. Pipeline Completeness Modeling 

The TPS search algorithm design results in a well- 
defined pipeline completeness that is, in the limit of 
broad band noise, a function of MES alone (j.lenkin^ 
1200211 . Since the TPS detection statistic, MES, is dis¬ 
tributed as a Gaussian with unit variance, the pipeline 
completeness (fraction of transit signals present in the 
data that are recovered by the pipeline) has a theoretical 
expected form of 

(MES - MESthresh)] 

whe re MESthresh = 7.1 i s the adopted detection thresh¬ 
old (j.lenkins et al.l 1^021 : l.lenkinsl 1200^ . However, the 
presence of stochastic, impulsive systematics of instru¬ 
mental or astrophysical origin in the time series that are 
not due to a transit signal results in an overwhelming 
number of false alarm detections when MES is the only 
criteria employed for detection. In the Q1-Q16 pipeline 
run, 57% of t argets result in a detectio n when based upon 
MES alone (|Tenenbaum et aP 1201411 . To mitigate the 
false positive detections, additional criteria (or vetoes) 
are employed that quantify how consistent the depths, 
shape, and duration of individual events are with eac h 
other (|Tenenbaum et al]l2013l . [20l^ iSeader et al.ll2013ll . 
The additional vetoes cause the pipeline completeness 
to be suppressed relative to the theoretical expectation 
gi ven in Equation ffl _ 

IChristiansen et alT l)2015ll quantify the resulting sup¬ 
pression of the pipeline completeness through Monte- 
Carlo transit injection and recovery tests. They find that 
the gamma cumulative distribution function (CDF) pro- 


Fdet(MES) = \ + 



































4 


vides a good fit to the suppressed pipeline completeness, 

i^gamma(x|o,&) = j SXp"*/^ A, (5) 

where r(a) is the gamma function and the argument to 
the gamma CDF, x = MES — 4.1 — (MESthresh — 7.1), is 
related to MES by an offset of 4.1 in order to achieve a 
good fit. The parameters for the gamma CDF adopted 
for this study are a=4.65 and 6=0.98. In rare cases, due 
to the timeout limits of the TPS planet search, MESthresh 
is higher than the normal MESthresh=7.1. Section lOl can 
be used to provide a mapping for the 2D grid of Rp and 
Porb onto MES. The pipeline detection efficiency provides 
a mapping from the MES to the pipeline completeness. 


2.3. Window Function 

The final component of the pipeline completeness 
model accounts for the limits of the data coverage for 
meeting the planet search detection requirement of hav¬ 
ing at least iVtr > 3. The transit survey window 
function, Pwin, quantifies the probability that a req¬ 
uisite number of transits required for detection occurs 
in th e observational data (IGaudill2000Hvon Braun et al.l 
I2009t iBurke &: McCulloughI 1201^ Since Kepler op¬ 
erates in the high duty cycle regime, we adopt the 
binomial analytic w i ndow function as discussed in 
IBurke fc McCulloughI (1201411 and originally introduced 
by iDeeg et al.l (|2004ll . The analytic window function 
matches the average behavior o f the fully numerical 
wind ow function (see Figure 9 of IBurke fc McCulloughI 
l2014fl . and requires as in put Tnhs, Pirh. and /riii t y. Fo l- 
lowing Appendix A.4 of IBurke fc McCullouglJ (|2014f) . 
the window function probability of detecting at least 
three transits can be explicitly written out in the bino¬ 
mial approximation as 


^M-l 


Pwin,>3 = 1 - (1 - /duty)“ " M/duty(l " /duty) 


where M = Tobs/Porb- The final completeness model 
results from Pcomp ~ Pwin ^ Pdet- 
For targets with data in all Q1-Q16 quarters, the 
analytic window function predicts Pwin ,>3 > 0.98 for 
Porb<300 days. We compare the impact on Pcomp be¬ 
tween using the analytic window function and a full nu¬ 
merical window function in Section 12.41 


2.4. Worked Example and Limitations 

As a worked example, we demonstrat e the calcula¬ 
tion of Pcomp for the host to Kepler-22b (jBoriicki et al.l 
1201211 . This target with Kepler Input Catalog (KIC) 
identifier 10593626 has stellar parameters P*=0.98 Rq, 
Teff=5640 K, l og g=4.44 as c ompil ed by the Q1-Q16 stel¬ 
lar catalog of iHuber et al.l (|2014ll . In order to gener¬ 
ate Pcomp over the two-dimensional space of Rp and 
Porb shown in Figure [U we employ the following in¬ 
put values: /duty = 0.879, Tobs = 1426.7 days, e = 
0, MESthresh = 7.1, and the 14 robCDPP noise esti¬ 
mates, o-cdppM = [36.2,33.2,31.0,29.4,28.0,26.1,25.4, 
24.2,23.1,22.4,21.9,21.8,21.7,21.5] ppm. The quanti¬ 
ties necessary to generate Pcomp for all Kepler targets 
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Fig. 1.— Fractional completeness model for the host to Kepler- 
22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic 
model described in Section [2] 
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Fig. 2. — Absolute difference for pipeline completeness between 
the analytic model described in Section [2] and a more accurate nu¬ 
merical pipeline completeness model that employs the full CDPP 
time series and numerical window function for the KIC target 
10593626. 

searched for planets in the Q1-Q16 pipeline run are avail¬ 
able as part of the Q1-Q16 Kepler stellar table hosted by 
the NASA Exoplanet Archive.® 

In order to investigate potential biases in the analytic 
pipeline completeness model, we show the absolute differ¬ 
ence between the analytic model presented in this study 
and a higher fidelity completeness model that is avail¬ 
able for future pipeline runs in Figure [2j The higher 
fidelity completeness model replaces two components of 
the completeness model described in Section[21 First, the 
analytic window function approximation is replaced by a 
full numerical window function calculated during TPS to 
take into account data gaps and data deweighting consis¬ 
tent with the planet search. Second, the simplified MES 
estimate of Section 12.11 that employs the robCDPP val- 


® http://exoplanetarchive.ipac.caltech.edu 
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ues is replaced by the “l-cr depth function” (ISDF). For 
the ISDF, TPS quantifies the transit signal depth that 
yields a MES=1 as a function of Porb for all 14 transit 
durations searched taking into account the full details of 
the full CDPP time series and deweighted data. 

Differences between the completeness estimates are 
largest (~ 0.06) towards longer p eriods (Porb> 300 days). 
The occurrence rate is oc P^omp l|YoudinlF2011ll . thus er¬ 
rors in the pipeline completeness can propagate directly 
to occurrence rates. Future study is needed in order to 
characterize the net impact of the higher fidelity com¬ 
pleteness model on occurrence rates for a full sample of 
Kepler targets. 

Although this initial test affirms the efficacy of the an¬ 
alytic completeness model for this well behaved Kepler 
target, this comparison does not fully test all its sim¬ 
plifying assumptions. The above test does not check the 
accuracy of our assumption of a simple dependence of the 
pipeline completeness on MES alone and the adopted A 
to k conversion. Due to a single injection per target, 
[Christiansen et ^ (|2015f) cannot rule out the possibil¬ 
ity that the pipeline completeness may depend on ad¬ 
ditional parameters beyond MES and contain a strong 
star-by-star dependence. Results for small samples or 
individual targets may systematically differ from the av¬ 
erage pipeline co mpleteness results o fjCh ristiansen et al.l 
(I2015fl . However. [Christiansen et al.l (j201^ have charac- 
terized the average pipeline completeness as a function 
of MES when averaged over a large sample of targets 
as is the case in this study. Future studies will focus 
on characterizing the star-by-star dependence of pipeline 
completeness by comparing the simplified completeness 
model outlined in this study to the higher fidelity com¬ 
pleteness model using the full numerical window function 
and Icr depth function for larger samples of targets. In 
addition, we are implementing support for ^ 10^ transit 
injections on a single target emplo ying the NASA Am es 
Pleiades super computing facility ISeader et al.[[20l3l . 

An additional shortcoming of any pipeline complete¬ 
ness model is the inescapable dependence on the assumed 
stellar parameters, eccentricity, and stellar binarity. Stel¬ 
lar parameters and eccentricity are employed in defining 
Tdur which sets the time scale relevant for the integrated 
noise level, and stellar binarity can result in third light 
contamination that impacts the assumed planet radius. 
In this study, we treat the pipeline completeness as hav¬ 
ing no uncertainty due to the incomplete understanding 
of the stellar parameters, eccentricity, and stellar bina¬ 
rity. However, we do explore sensitivity in the derived 
planet occurrence rates to alternative assumptions for 
the stellar parameters and non-zero eccentricities in Sec¬ 
tioning) 

Although it is a distinct process separate from the 
pipeline generation of TCEs, the vetting classification 
process of TCEs into planet candidates and false posi¬ 
tives also shapes the overall completeness of the planet 
candidate sample, and the vetting relied upon on a 
manual classification (i.e., human inspection) procedure 
(|Mullallv et al.l[2015[) . The TCE vetting process has an 
unquantified false negative rate of incorrectly classifying 
valid planet candidates as false positives and unconscious 
human biases and/or errors. For this study, we assume 
the vetting process is 100% complete, unbiased, and cor¬ 


rect. However, we do investigate the sensitivity of our 
results on this assumption in Section [6.2.4[ by varying 
the planet candidate sample. 

The analytic pipeline completeness model and input 
data presented in this study are only relevant to the 
TCE population genera ted by the Q1-Q16 pipeline run 
(|Tenenbaum et al][2014ll using the SOC 9.1 softwa re re¬ 
lease. The next TCE release (|Seader et al.l[2015ll used 
the SOC 9.2 software, which introduced changes to the 
data analysis and planet search algorithm that influences 
the pipeline sensitivity. In SOC 9.2, TPS implemented 
a bootstrap noise characterization algorithm during the 
search in order to recalibrate the detection threshold 
l|Seader et al.l[2015ll . The bootstrap noise characteriza¬ 
tion test allows the effective MES threshold to be a func¬ 
tion of Porb as opposed to being independent of Porb in 
the SOC 9.1 software release. In addition, the box-car 
signal template matched to the data in TPS was replaced 
by an average limb-darkened signal template to yield a 
better signal match. However, changing the match tem¬ 
plate influences the noise statistics (such as robCDPP) 
and the A to fc conversion. Finally, the relation be¬ 
tween pipeline completeness and MES is being analyzed 
through Monte-Carlo transit injection using the updated 
software release. 

3. STELLAR PROPERTIES 

Precise and accurate planet occurrence rates depend on 
precise and accurate stellar properties. Stellar properties 
influence three areas relevant to planet occurrence rates: 
the measured planet radii, the estimated transit dura¬ 
tion, and geometric transit probability. In this study, we 
ad opt stellar paramet er s from the 0 1 -QI6 KIC revision 
of [Huber et al.[ (j2014[ l. [Huber et all (j20I4[ l update Ke¬ 
pler target stellar parameters by adopting literature val¬ 
ues and additional observations (asteroseismology, spec¬ 
troscopy, and photometry) that ha ve become availabl e 
since the original KIC observations ([Brown e t al.[[20 11ll. 
With an improved observational database. [Huber et al.l 
([20I4I1 derive stellar parameters by fitting the observa¬ 
tions to iso chrones from the D artmouth Stellar Evolution 
Database ([Potter et al.[[20r)^ using a minimization. 

For this study we focus on planet occurrence for the G 
and K dwarf sample observed by Kepler. Previous occur¬ 
rence rate calculations indicate significant variations in 
the planet pop ulation as a function of stellar Tafr for the 
Kevler sample dHoward et al.l[2(112l : [Mulders et al.[[2?)T5l : 

[Burke et al.l[2015[ l. In order to simplify the planet occur¬ 

rence model by avoiding the dependence on stellar T^g, 
we focus on th e GK dwarfs rather than the full FGKM 
dwarf sample. [Burke et al.l ()2015[1 find that the planet 
occurrence rates agree when the G and K dwarf Kepler 
targets are analyzed separately in an Rp and Porb pa¬ 
rameter space similar to this study. We select G and K 
dwarfs by making the following cuts on the stellar pa¬ 
rameter catalog of [Huber et al.[ (|2014ll : 4200<Teff<6100 
K and P*< 1.15 Rq. In addition, we focus on the Ke¬ 
pler targets with nearly continuous coverage over the en¬ 
tire Q1-Q16 data span of the mission. We select targets 
with an observation data spanning Fobs > 2 yr, duty 
cycle /duty > 0.6, and a robCDPP< 1000 ppm at the 
7.5 hr transit duration. The lower limit on /duty en¬ 
sures the inclusion of the targets that are impacted by 
the CCD electronics loss in Q4 of the Kepler mission 
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Teff [K] 

Fig. 3.— Stellar Teff as a function of log g for the obse r ved K epler 
targets from the Q1-Q16 stellar catalog of lHuber et al.l 1)201411 (red 
points), and the GK dwarf targets selected for this study (gray 
points). For efficient plotting only a randomly selected subsample 
of the full catalog is shown. 



Period [day] 


Fig. 5.— Pipeline completeness model, Pcomp, averaged over the 
GK dwarf sample is shown by the contour levels over the Porb 
■R p parameter space. T he Q1-Q16 Kepler planet candidate sample 
of IMullallv et all II2Q15I) found around the GK dwarf sample is also 
shown (orange points). 



Fig. 4.— Distribution of the robCDPP at the 7.5 hr time scale 
as a function of Kp for the G and K dwarfs analyzed in this study. 
Mean properties of the target stars as a function of Kp are provided 
in Table (2 

(|Batalha et al.l [201^ . The above cuts result in 91,567 
targets in our sar nple. Figure 3l show s Teff and logg for 
the full catalog of lHuber et al.l (1201411 (red points) along 
with the Kepler targets selected using the above criteria 
(gray points). Table [T] provides the Kepler identification 
number and a binary flag to indicate that the target be¬ 
longs to the baseline stellar sample selected by the above 
criteria. Adopting Tefr=5200 K as the dividing line sep¬ 
arating the G and K dwarfs, 80% of our stellar sample 
belong to the G dwarf category. 

In Figure |4] we present the noise distrib ution of the se¬ 
lected GK dwarfs as a function of K ^. iGilliland et al.l 
(l2iO and iGhristiansen et ^ (|2012f) discuss the instru¬ 
mental and astrophysical sources of noise in the Kepler 
data. We also provide mean properties of the GK dwarfs 


TABLE 1 

Kepler GK Dwarf Target Samples 


KIC ID 

Baseline Sample 

Original KIC Sample 

757450 

1 

1 

891901 

0 

1 

891916 

1 

1 

892718 

1 

1 

892772 

1 

1 

892832 

1 

1 

892834 

1 

1 

892882 

1 

1 

892911 

1 

1 

892946 

0 

1 




Note. — This table is available in its entirety in 
a machine-readable form in the online journal. A 
portion is shown here for guidance regarding its form 
and content. 

as a function of Kp in Table [2] In magnitude wide bins. 
Table [2] provides the bin centers, number of targets, mean 
stellar properties (i?*, logg, and Tes), and the 7.5 hr 
robGDPP for the lO**', 50*^, and 90*^ percentiles in each 
bin. Using the pipeline completeness model of Section [2l 
we show in Figure [5] the average pipeline completeness 
for the GK dwarf sample in terms of detection probabil¬ 
ity contour levels over the Torb and Rp parameter space 
examined in this study. The figure does not include the 
effects of the geometric probability to transit. We discuss 
our choice of Torb and Rp space examined for this study 
in Section m 

4. PLANET PROPERTIES 

We measure the plan et occurrence rate usin g the 
Q1-Q16 pipeline run (jTenenbaum et al] I20I4H and 
the resulting Keple r planet candidate sample from 
iMullallv et al.l (j2015[ l. We choose to limit our analysis 
to 50<.Porb<300 days and rocky to mini-Neptune planets 
with 0.75<i?p<2.5 R^. Th e Pnrh range under iny estiga- 
tion has several advantages. IMullallv et al.l l)2015ll classi¬ 
fied the new and pre-existing KOIs into planet candidate 
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TABLE 2 

Kepler Target Summary 


TABLE 3 

Q1-Q16 Planet Candidate Samples 


Ap 

N 

{K) (logg) (Ks) 

robCDFF7.5hr 

robCDPPy.sh, 

r o b (J U B flu m b e r 

Baseline 

Original KiO 

High Hcliability 

bull Long Beriod 

[mag] 


[R«] 

[cgs] 

[K] 

10“'% [ppm] 

50“'% [ppm] 

CD 

O 

1 

0 

0 

0 

0 

8 

9 

0.93 

4.48 

5658 

9.3 

13.1 

27,4i.Ui 

0 

0 

0 

0 

9 

36 

0.90 

4.48 

5543 

12.5 

22.6 

lOO.-Sl-OS 

0 

0 

0 

0 

10 

117 

0.93 

4.46 

5640 

13.1 

23.9 

169.^-01 

0 

0 

0 

0 

11 

364 

0.91 

4.47 

5608 

17.5 

26.8 

142.S1-01 

0 

0 

0 

0 

12 

1312 

0.89 

4.48 

5613 

22.6 

32.1 

67.(70.01 

0 

0 

0 

0 

13 

4964 

0.90 

4.48 

5643 

32.8 

44.9 

78.5’0.03 

0 

1 

0 

0 

14 

16011 

0.88 

4.50 

5625 

51.3 

70.9 

106.'80-05 

0 

0 

0 

0 

15 

41649 

0.86 

4.52 

5563 

88.2 

123.9 

180.12.02 

0 

0 

0 

0 

16 

27027 

0.81 

4.56 

5413 

147.8 

193.4 

271.'35.01 

0 

0 

0 

0 

17 

58 

0.82 

4.54 

5201 

328.0 

456.0 

788.9 



Note. — This table is available in its entirety in a machine-readable form ii 


and false positives for the Porb>50 days regime. Thus, 
the selected period range represents a uniform classifi- 
cation of plan e t cand idates following the procedures of 
iMullallv et al.l (|2015ll . The cumulative KOI catalog for 
7dirb<50 days curren tl y consists of classi fic ations from 
iBatalha et al.l (1201311 . iBurke et al.l (|2014ll . iRowe et 
(j2015D . and IMullallv et al.l (j2015[l . Toward shorter and 
longer Porb) the population of instrumental false alarm 
detection s increases rapidly (see bottom panel of Fig¬ 
ure 4 in iTenenba um et all 2014f). The yetting process 
employed by iMullally et al.l 1)20151) effectiyely remoyes 
much of the instrumental false alarm contamination, but 
the remaining contamination is potentially higher out¬ 
side this Porb range. 

For a majority of Kepler targets, Porb~300 days is 
roughly the transition between planet candidates hav¬ 
ing at least 4-5 transit events contributing to the detec¬ 
tion and planet candidates having the minimum three 
tr ansit events contributing to the detection (see Figure 9 
of IBurke fc McCulloughl[2014H . The three transit event 
(Porb ^300 days) low MES planet candidat es are the most 
challe nging candidates to vet properly (|Mullallv et al.l 
1201511 and further work is needed to understand the false 
alarm rate of this population. Thus, we exclude them 
from planet occurrence rate calculations for the time be¬ 
ing. Third, approximating the behavior of the full CDPP 
time series by the summary robCDPP statistics for the 
pipeline completeness model can be inaccurate at long 
periods when there is an increased chance that the few 
transit events available can occur during outliers of the 
CDPP noise distribution. 

The astrophysical false positive contamination rates 
have observationally (ISanterne et al.l 1^0121: iDesert et al.l 
201511 and theor etically ( Morton &: Johns^ 120111 : 


Fressin et al.l 1201311 been shown to increase towards 
shorter Porb- Also for shorter periods, Porb<3 days, 


the harmonic removal filter in the pi peline increasingly 
remo y es tra nsiting planet signals (jChristiansen et Ml 
20131 1201511. The de tection efficiency reported 


Christiansen et all (|2015ll is calculated for Porb>10 days. 


Thus, the detection efficiency does not take into account 
the impact of the harmonic removal filter. 

In order to select the Kepler planet candidate sample 
for analysis, we must limit the KOI planet candidates to 
o nes recovered in the Q 1-Q16 pipeline run. The analysis 
of iMullallv et al.l (|2015ll uniformly vetted the pre-existing 
KOIs and new KOIs that made a Threshold Crossing 
Event (TCE) corresponding to the KOI ephemeris for 
Porb>50 days. To supplement the list of planet can¬ 


portion is shown here for guidance regarding its form and content. 


didates, we make special exceptions for systems with 
strong transit timing variations (TTV). The pipeline only 
searches for transits with a uniform ephemeris and tar¬ 
gets with high SNR and strong TTVs can result in mul¬ 
tiple TCEs at the incorrect period, but corresponding 
to one to a few of the high SNR transit events. If it is 
clear that the TCE corresponds to one or a few of the 
single events of a TTV system, but formally the TCE 
ephemeris does not match the KOI ephemeris, then that 
TTV KOI planet candidate is accepted as recovered by 
the pipeline and included in our analysis. We provide the 
KOI planet candidates deemed as recovered by the Ql- 
Q16 pipeline run in Table [3l The recovered KOIs within 
the 50<Porb<300 days limits for the baseline occurrence 
rate calculation are indicated by a binary flag in Table [31 
In addition, we provide KOI planet candidates recov¬ 
ered in the QI-Q16 pipeline run in the 10<Porb<50 day 
range in order to support analysis of the Kepler planet 
candidate sample outside the parameter space of this 
study. The 10<Porb<50 day range has not been uni¬ 
formly vetted, so for KOI recovery designations in this 
parameter space we start with the cumulative KOI ta- 


(IBorucki et alJl2011allbl: IBatalha et alJl2013 

: IBurke et alJ 

I20I4 IRowe et al .11 20 151: IMullallv et al.ll2015 

). We employ 

an ephemeris matching routine (Mullallv et al.ll2015fl to 


judge whether a Threshold Cr ossing Event (TCE) de- 
tection from the pipeline run (jTenenbaum et aP I20I4I1 
matches the ephemeris for the KOIs. KOIs with a match 
statistic satisfying that the KOI ephemeris overlaps with 
> 90% of the TCE’s transit events are automatically ac¬ 
cepted as recovered. The KOI ephemeris from a previous 
catalog may not be accurate enough to guarantee 100% 
of the transit overlapping, and the 90% matching level is 
sufficient to automatically adopt the TCE as correspond¬ 
ing to the KOI without manual inspection. Also, special 
exception was made for systems with strong TTVs as 
discussed above. Table [3] lists the KOIs from the cumu¬ 
lative KOI list that are designated as being recovered in 
the Q1-Q16 pipeline run for Porb>10 days. 

For the planet radii, we adopt estimates from 
the uniform KOI a nalysis in IRowe et al.l (j2015[ l and 
IMullallv et al.l (1201511 The technique is described fully 
in IRowe et al.l (j2014f l . Briefly, the flux time series 
data are detrended with a moving cubic polynomial 
fit. Data occurring during transit or near gaps are 
excluded from the moving polynomial fit. Assuming 
a circular orbit, fixed limb darkening parameters from 





















































































Claret fc Bloemeiil (120111), and s t ellar p arameters from 
Huber et al.l (l20lS) . iRowe et alJ (|2014li use a Markov- 
chain Monte-Carlo methodology to estimate the best fit¬ 
ting parameters of a limb darkened transit model. 

Overall, we find 156 planet candidates orbit stars in the 
GK dwarf sample within the Porb, Rp parameter space 
under investigation. We illustrate the planet candidate 
sample (orange points) in Figure [5] In our analysis de¬ 
scribed in Section [5l we do not take into account the un¬ 
certainties on Rp. However, we do explore the influence 
that systematic changes to the planet candidate sample, 
stellar sample, and independent model fits have on the 
resulting planet occurrence rates in Section \S?I\ 

In this study, we do not model or include the impact 
of astrophysical false positive contamina tion in our sam - 
ple. Following the process outlined in iMortc^ (j2012tl . 
a preliminary astrophysical false positive analysis was 
completed for 108 (70%) of the baseline planet candi¬ 
date sample. We find that the average and median false 
positive probabilities for the calculated sample are 4% 
and 0.6%, respectively. Twelve planet candidates in the 
sample have an astrophysical false positive probability 
Pfpp > 10% and the highest is 60%. The astrophysical 
false positive contamination for the parameter space un¬ 
der investigation is within the statistical and systematic 
uncertainties and can be safely ignored for this study. 
However, for shorter and longer Porb, the astrophysical 
false positive contamination becomes increasingly impor¬ 
tant. 


5. PLANET OCCURRENCE RATE METHOD 


In order to infer the underlying planet occurrence rate 
from the observed distribution of Kepl er plane t candi ¬ 
dates, we ex tend the methodology of lYoudinl (|20Iin . 
lYoudinl (|20Iir) present a parametric model for the planet 
distribution function (PLDF) and use likelihood maxi¬ 
mization techniques to estimate the parameters that best 
describe the observed planet candidate distribution and 
the paramete r uncertainties. We extend the method of 
lYoudinl (|20III1 by employing Bayesian parameter estima¬ 
tion theory using Markov Chain Monte-Carlo (MCMC) 
methods to numerically evalua te the posterio r distribu¬ 
tion of the PLDF parameters (iGregorvI 1200511 . We were 
motivated to re place the intuit ive and analytic minimiza¬ 
tion method of lYoudinl (|2011ll with a Bayesian MCMC 
parameter estimation method in order to analyze a more 
complicated PLDF model and enable future efforts to ex¬ 
plore higher dimensional models including, for example, 
dependence on stella r param eters. 

Following lYoudinl (|2011ll . we employ the Poisson 
distribution for the likelihood. A helpful description mo- 


tivating the P oisson likelihood is given in Section 5.3.2 of 
iLoredol (|I992ll , and the Poisson likelihood is commonly 


used in order to interpret astronomical detections with a 
varying survey sensitivity (iTaba chnik fc Tremainell2002l 


AllenI 120071: ICumming et al. 20081: Kraus & Hillenbrand 

2012 

; iForeman-Mackev et al.l I20l4 iNielsen et al. 

2014 

). Also, the point process statistics literature 


( Dalev k, Vere- Joiiei 119881 IBaddele^ 1200711 rigorously 


shows that the Poisson likelihood is appropriate for 
analyses of spatial point data. For this application, 
the observed planet candidate distribution is treated 
as an inhomogeneous Poisson process where the PLDF 
describes the dependence of the Poisson process intensity 


on Porb and Rp. 

Independent of the choice of likelihood, one is free to 
choose any parametric form for the PLDF model. Pre¬ 
vious work has indicated that a power la w form of the 
PLDF describes the Kepler observatio ns (lYoudinl 1201II : 
IHoward et al.l[2Ml : iDoiig fc Zhull20I3ll over portions of 
the Rp and Porb parameter space. For this study, we 
adopt a PLDF dependent upon Porb and Rp parameter¬ 
ized as a power law in Porb and a broken power law in 
Rp over a specified domain Pmin < Porb < Pmax and 

77min ^ Rp — Rm&x- 


dPorbdPp 


Po Cn g(Porb, 7?p) 






(fe)” (^) 


if Rp < Pbrk 
if Rp P Pbrk 

(7) 


where Pq is the integrated planet occurrence rate, Cn 
is a normalization factor, g(x) is the shape function, 
Pq — (Pmin T Pmax)/2 and Rq = (Pmin T Pmax)/2 are 
domain scaling factors, Pbrk is the break radius transi¬ 
tion between the two Rp power law exponents (ai and 
02 ), and P is the Porb power law exponent. The Cn is 
determined from the normalization requirement. 


-Pmax -^max 

J j Cng(Porb,Pp)dPorbdPp = I. (8) 

-Pmin Pmin 


Overall, the PLDF has five free parame ters: Pn , /3, a ^ , 
02 , and Pbrk- Following Equation (18) of lYoudid (j20Illl . 
the Poisson likelihood of the data for a survey that de¬ 
tects A^pi planets around W survey targets is 


L oc 


Npi 


Po'^^‘C„^'>‘n5(7"orb,Pp) 




exp(-iVexp), (9) 


where the PLDF model predicted number of detections 
from the survey is 


N, 


exp 


= PoC„ 



(10) 

and the likelihood ignores constant multiplicative fac¬ 
tors. In Equation (nni, the underlying PLDF model 
is modified by the per-star transit survey effectiveness, 
? 7 j(x), summed over AC targets in the sample, where 
? 7 j(x) = Pj,comp X Pj,tr Is the per-star pipeline com¬ 
pleteness model of Section [2] and Pj^ti is the geomet¬ 
ric probability to transit. The transit probability factor, 
Pj^tr = (P*/a)/(I — e^), dep ends on the stellar parame¬ 
ters and orbital eccentricity (|Burkell2008ll . 

The separable form between Rp and Porb of the PLDF 
adopted in this study, is overly restrictive if applied to 
a larger range of Rp. Previous studies have identified 
a dependence of th e Pnrh exponent, /3, on planet radius 
(|Doiig fc Zhul I20I3I : iForeman-Mackev et al.l I20I4I1 , with 
an apparent transition in the Porb dependence around 
Rp^AR^. For the 0.75<Pp<2.5 P® analysis region of 
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this study we do not find evidence for a more compli¬ 
cated dependence between Rp and Porb being necessary 
based upon residuals between the observed and model 
fitted planet counts. Also of note in Equation (fTOll . is 
that the summation of r]j over the stellar sample is inde¬ 
pendent of the PLDF parameters. Thus, the summation 
can be computed once for the analysis and the planet 
occurrence depends upon the integrated/average transit 
survey effectiveness alone rather than explicitly depend¬ 
ing upon the per-star survey effectiveness. 

We complete the Bayesian posterior calculation by 
specifying uniform priors for all parameters except for 
Fq which has a prior that is uniform in the logarithm. 
The adopted MCMC implementa tion for t his an alysis 
is based upon the description in iGregorvI (12005!) that 
employs a Metropolis-Hastings algorithm with an au¬ 
tomated iterative proposal step-size determination and 
has been applied to transit model light c urve analysis 
(iBurke et al.ll2007ll . transit timing a nalysis (iBurke et al. 
201C ), and radial ve l ocity analysis (|Charbonneau et ^ 
2009 : iBallard et al.l I2011D . For this study, we do 
not incorporate uncertainty in i?pand ignore contribu¬ 
tions to the planet candidate sample due to astrophys- 
ical and instrument al false positives (see lYoudinl 120111 : 
iMullallv et al.l[2Ml . for a more in-depth discussion). In 
the case of multiple planet systems, adopting the Poisson 
likelihood treats multiple planets in a system as indepen¬ 
dent, and thus this method can not capture any struc¬ 
ture and correlations between planet’s in a single system, 
but captures the average behavior over a large sample of 
stars. However, we do constrain the sensitivity of our 
results to these potential complications in Section [521 


6. RESULTS 

In this section, we provide planet occurrence rate de¬ 
terminations base d upon the QI-016 K epler planet can¬ 
didate sample of IMullallv et al.l ( 20I5I 1 (see Section [4] 
and Table El). We focus on the GK dwarf targets ob¬ 
served by Kepler us i ng ste llar parameters from the cat¬ 
alog of iHuber et al.l (|2014il (see Section O and Table |T]). 
We describe our analytic pipeline completeness model 
in Section |2l that employs the pipeline detection effi¬ 
ciency as calibrated by the Monte-Garlo transi t injec - 
tion and recovery provided bv iGhristiansen et al.l (j20I5! ). 
The planet occurrence rate is derived through a param¬ 
eterized model for the PLDF, where the parameters and 
their uncertainties are determined within a Bayesian pa¬ 
rameter estimation problem with the posterior estimated 
through MGMG techniques (see Section [5]). The above 
set of inputs represents our current best/baseline model 
for planet occurrence rates, and we describe the results 
in Section i5n We then perform a sensitivity analysis 
in Section 16.21 in order to explore the systematic uncer¬ 
tainty in the planet occurrence rates due to imprecise 
knowledge of the baseline inputs. 


6.1. Baseline Results 

For the baseline result, we fit the PLDF over the 
parameter space of 0.75<i?p<2.5 Rq and 50<Porb< 
300 days. We tabulate 10,000 subsamples from the full 
MCMC posterior samples for all the parameters along 
with the resulting likelihood and prior values in Table IH 
The overall occurrence rate for this parameter space 


TABLE 4 

PLDF Model Parameter Posterior Samples 


'oi 02 Rbrk 3 Fq 


-1.80587 

-0.91336 

-2.55130 

16.62721 

6.77569 

-1.34402 

-1.90010 

6.59736 

-1.96796 

16.05061 


9.60189 

-7.31895 

-1.50156 

-1.43560 

-1.25637 

-7.41157 

-3.45914 

-1.64782 

-3.36621 

-2.14134 


2.42398 

2.20004 

1.71089 

0.91343 

0.87505 

2.37924 

2.01354 

1.00841 

2.29527 

0.94454 


-0.53218 

-0.68832 

-0.78946 

-0.68223 

-0.61621 

-0.45503 

-0.91872 

-0.55609 

-0.34320 

-0.76077 


1.04356 

0.74751 

1.04314 

0.69378 

0.67331 

0.93427 

0.97944 

0.71677 

1.02223 

0.83308 


Ln(Likelihood) Ln(Prior) 


-1154.1220 

-1152.1463 

-1156.0940 

-1151.8822 

-1153.4823 

-1155.0970 

-1155.2124 

-1152.5281 

-1155.7010 

-1152.5983 


-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 

-11.3374 


Note. — This table is available in its entirety in a machine-readable 
form in the online journal. A portion is shown here for guidance regarding 
its form and content. 

Fq = 0.77±0.I2 planets per star. Relying on the statisti¬ 
cal uncertainty alone, the 3 -ct upper limit Fb_3o-u.L. = 1-3 
implies that we cannot currently rule-out a scenario that 
when averaged over large samples of GK dwarfs there ex¬ 
ists more planets in the analyzed parameter space than 
stellar hosts. The 3-a lower limit Fo, 3 (tL.l = 0.49 implies 
that for large samples of GK dwarfs there exists on aver¬ 
age at least one planet in the analyzed parameter space 
for every two stellar hosts. 

Figure [6] shows how well the parametric PLDF 
model predicts the observed, uncorrected Kepler planet 
candidate counts summed over 50<Porb<300 days in 
di?p=0.25 sized bins (points with uncertainties). 
Evaluating the PLDF at the parameters that maximize 
the likelihood fit to the data, we show the model pre¬ 
dicted counts (iVexp of Equation ITOl) where the limits 
of integration are 50<Porb<300 days and di?p=0.25 Rq) 
as the white dashed line. In addition, we show the me¬ 
dian (solid blue line), l-cr (orange region), and 3-cr (blue 
region) model predicted counts by evaluating Wxp us¬ 
ing 10,000 random samples from the posterior PLDF 
parameter estimates from the MCMC chain. Figure [7] 
shows the equivalent information, but along the Porb di¬ 
mension after marginalizing over 0.75<i?p<2.5 i?® and 
dPorb=31.25 days. The bin sizes for the abcissae in Fig¬ 
ures [S] and |7| are chosen in order to balance segmenting 
the parameter space range into a high number of evenly 
sized bins and maintaining at least three detections in 
each bin. 

Figure [5] quantifies the underlying PLDF free of the 
deleterious effects of the Kepler pipeline completeness 
and geometric transit probability. The white dashed line, 
representing the PLDF for parameters that maximize the 
likelihood of the data, rises toward small planets with 
a 2 = —1.8 and has a break near the edge of the param¬ 
eter space. Given the low numbers of observed planet 
candidates in the smallest planet bins, the full posterior 
allowed behavior (l-cr orange region ; 3-cr blue region) 
cannot distinguish between a rising or falling PLDF for 
i?p< 1.5 i?0. Figure [9] shows the equivalent information, 
but along the Porb dimension after marginalizing over 
0.75<Pp<2.5 P© and dPorb=31.25 days. 

Formally, in our baseline analysis of the GK dwarf sam¬ 
ple, the double power law in the Rp model is unwarranted 
relative to a single power law according to the Bayesian 
information criterion (BIG) methodology for model com¬ 
parison. However, we choose to provide the final results 










































10 



Fig. 6 .— Comparison between the predicted planet sample 
from the planet occurrence rate model and the observed Ke¬ 
pler planet candidate sample. The observed, marginalized over 
50<Porb<300 days, histogram of Kepler planet candidate counts 
as a function of Pp (points) can be compared to the maximum like¬ 
lihood model for the predicted counts (white dash line). Also shown 
is the posterior distributions of the model predicted counts for the 
median (blue solid line), 1-cf region (orange region), and 3-cr region 
(blue region) marginalized over Porb in bins of dPp=0.25 Pq. 



Fig. 7. — Same as Figure!^ but marginalized over 0.75<Pp<2.5 
P 0 and bins of dPorb=31.25 days. 


in terms of the double power law model for the follow- 
ing reasons: (a) The additional flexibility of the double 
power law model provides a better fit to the smallest 
i?p parameter space of most interest, whereas the single 
power law model systematically overestimates (by ~0.5 
tr in a comparable data/model comparison to that shown 
in Figurelll) the occurrence rates in the smallest i?p bins, 
(b) The more complicated model ensures the ability to 
adapt to variations in the PLDF in the sensitivity analy¬ 
sis of Section (c) Previous work on Kepler planet oc¬ 
currence rates indicat ed a break in the planet population 
for 2.0 <. Rn5,2.8 Rm (pessin et al.ll2013t iPetigura et al.l 
l2013alfa : lSilburt et al. I2015D . Idl Finally, extending this 


Fig. 8 .— Shows the underlying planet occurrence rate model. 
Marginalized over 50<Porb<300 days and bins of di?p=0.25 i?® 
planet occurrence rates for the model parameters that maximize 
the likelihood (white dash line). Posterior distribution for the un¬ 
derlying planet occurrence rate for the median (blue solid line), 
1-(T region (orange region), and 3 -it regio n (blue region). An ap - 
proximate PLDF based upon results from iPetigura et al.l Il2013bl ) 
for comparison (dash dot line). 



Orbital Period [day] 

Fig. 9.— Same as Figure[ 8 l but marginalized over 0.75<i?p<2.5 
and bins of dPorb=31.25 days. 

work to a larger parameter space and for alternative 
target selection samples, such as the Kepler M dwarf 
sample where a sharp break at j?n^2.5 is obs erved 
(|Dressing fc Charbonneaul 120131 IBurke et al1l2015l) . the 
double power law in Rp is strongly (BIC> 10) warranted. 

Symptomatic of the weak evidence for a broken power 
law model over the 0.75<i?p<2.5 i?® range, i?brk is not 
constrained within the prior i?p limits of the parameter 
space. When i?brk is near the lower and upper Rp limits, 
ai and a 2 also become poorly constrained, respectively. 
To provide a more meaningful constraint on the average 
power law behavior for Rp in the double power law PLDF 
model, we introduce ciavg, which we set to ciavg = cii if 
Rhrk > Rmid and CTavg = 0(2 Otherwise, where i?mid is 
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TABLE 5 

PLDF Model Parameter Summary 



7b 

ai 

OL2 

-^brk 


3 

Likelihood Max 

0.73 

19.68 

-1.78 

0.94 


-0.65 

0.13% 

0.48 




-3.09 

-1.20 

15.9% 

0.66 




-1.97 

-0.85 

50.0% 

0.77 




-1.54 

-0.68 

84.1% 

0.92 




-1.04 

-0.35 

99.9% 

1.32 




0.53 

-0.19 

Lower Limit 

0.28 




-3.25 

-1.4 

Upper Limit 

1.92 




0.53 

-0.10 


the midpoint between the upper and lower limits of Rp. 
We find ttavg = -1-54 ± 0.5 and /3 = -0.68 ± 0.17 for 
our baseline result. We use ctavg as a summary statistic 
for the model parameters only to enable a simpler com¬ 
parison of our results to independent analyses of planet 
occurrence rates and to approximate the behavior for 
the power law Rp dependence if we had used the simpler 
single power law model. The results for a single power 
law model in both i?p and Porb are equivalent to the re¬ 
sults for the double power law model (Pq = 0.83 ± 0.13, 
a = -1.56 ± 0.3, and /3 = -0.68 ± 0.17). 

In Table [SJ we provide the parameters of the PLDF 
that maximizes the likelihood for the data in our base¬ 
line analysis as well as the median and percentile poste¬ 
rior values for Fq, /?, and aavg- Additional statistics for 
the full five parameter PLDF can be estimated from the 
10,000 posterior MCMC samples in Table |H 

6.2. Sensitivity Analysis 

Planet occurrence rate calculations are only as accu¬ 
rate as the inputs. The baseline results of Section [5] rep¬ 
resent our current best set of data that are uniformly ap¬ 
plicable to the Kepler observations and planet search re¬ 
sults. The resulting posterior distribution for the PLDF 
parameters in the above analysis only represent their sta¬ 
tistical precision and do not capture potential sources of 
systematic uncertainties present in the inputs. To ex¬ 
plore the level of systematic errors present in the current 
results, we repeat the baseline analysis, but for several 
scenarios in which we change a single input. The follow¬ 
ing sections describe results of these sensitivity tests. 

6.2.1. Pipeline Completeness Systematics 

The pipeline detection efficiency we employ for the 
baseline analysi s is calibrated with trans it injection and 
recovery tests (|Christiansen et all 1201511 , but it repre¬ 
sents the pipeline response averaged over a wider range 
of Kepler targets than t he limited GK dwa r f sam ple of 
this study. In addition, IChristiansen et ahl (1201511 ana¬ 
lyzed a shorter (four Kepler quarter) subset of the en¬ 
tire Q1-Q16 data. It is expected that the pipeline com¬ 
pleteness primarily depends upon the MES and number 
of transits, thus the results from the shorter four quar¬ 
ter analysis are applicable to the sixteen quarter run. 
However, star-by-star deviations are expected, and until 
we perform larger injection studies it is prudent to in¬ 
vestigate the sensitivity of the occurrence rates to this 
potential source of uncertainty. We consider an opti¬ 
mistic and pessimistic detection efficiency relative to the 
baseline result. For the optimistic detection efficiency, 
we assume the ideal theoretical expected performance of 


TPS given by Equation (|3]). For the pess imistic detec¬ 
tion e fficiency we assume the result from iFressin et al.1 
(|20I3ll . where they find a linear detection efficiency over 
the range 6<MES<I6 provides the best match to the 
SNR dis tribution of the QI- Q6 Kepler planet candidate 
sample ([Batalha et al.1 120I3I1 . The detection efficiency 
of the Q1-Q6 Kepler pipeline was never measured di¬ 
rectly using Monte-Carlo transit recovery tests. Thus, 
we ca nnot determine the accuracy of the IFressin et al.1 
(|20I3ll detection efficiency relative to the QI-Q6 Kepler 
pipeline run. However, having measur ed the detection ef¬ 
ficien cy for the Q1-QI6 p i peline run (|Christiansen et HI 
1201511 . the IFressin et al.1 (I20I3I1 detection efficiency is 
overly pessimistic for the pipeline completeness of the 
Q1-QI6 pipeline run. 

Overall, an overly optimistic detection efficiency re¬ 
duces the planet occurrence rate and a pessimistic de¬ 
tection efficiency increases the planet occurrence rates. 
We show in Figure ITUI the posterior integrated planet oc¬ 
currence rate for the baseline result (orange histogram) 
compared to the case of an optimistic (black line) and 
the pessimistic (black with circles line) detection effi¬ 
ciency alternatives. For this comparison we narrow the 
parameter space of integration (1.0<i?p<2.0 R^ and 
50<Porb<200 days) in order to limit the comparison to 
a region of parameter space with better statistics and to 
facilitate comparison with Kepler occurrence rates from 
previous studies. We symbolize this narrower parame¬ 
ter space planet occurrence rate as Fi. For clarity of 
display in Figure 1101 the optimistic and pessimistic oc¬ 
currence rate posteriors are shown by a log-normal fit 
to the posterior rather than the full posterior detail in 
a histogram format. The pessimistic detection efficiency 
has a >3-(T larger occurrence rate than the baseline re¬ 
sult and the optimistic detection efficiency is 2.5-cr lower 
than the baseline result. This initial test demonstrates 
that systematic effects can be larger than the random 
uncertainties. 

The alternative inputs also influence the other ‘shape’ 
parameters of the PLDF. We show samples from the pos¬ 
terior distribution of aavg (Figure [H]) and (3 (Figure [12]) 
as a function of Fq for the baseline (orange points) occur¬ 
rence rate parameter estimates. As an approximation to 
the joint 2-cr posterior distribution we model the poste¬ 
rior as a multi-normal distribution with major and minor 
axes along the eigenvectors determined from the poste¬ 
rior samples (orange ellipse). For comparison, we show 
the optimistic (black ellipse) and pessimistic (black with 
circles ellipse) detection efficiency solutions by the 2-a 
ellipse model for the shape parameters. The systematic 
variations of aavg and (3 are correlated with Fq. 

6.2.2. Orbital Eccentricity 

In the baseline result we have assumed circular orbits 
when constructing the model for pipeline completeness. 
However, radial velocity studies have reveale d that eccen¬ 
tric o rbits are common for Porb >10 days (|Butler et al.l 
Hool. A nonzero eccentricity results in higher proba¬ 
bility to transit, but a shorte r transit durati on degrades 
the transit SNR (jBurkell2008l l. iBurkel (|2008[ ) shows that 
yields from a transit survey could be up to 25% higher 
using the observed distribution of radial velocity plan¬ 
ets. We investigate a limiting case of assuming all plan¬ 
ets have e=0.4. The e=0.4 case results in an 11% (I-cr) 
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Fig. 10.— Posterior distribution for the integrated planet occur¬ 
rence rate over the 1.0<ilp<2.0 and 50<Porb<200 days pa¬ 
rameter space, Fi . Changes from the baseline inputs (filled orange 
histogram) systematically impact the derived occurre nce rate be¬ 
yond the statistical uncertainty. We discuss in Section [6]2] alterna¬ 
tive inputs: optimistic detection efficiency (black line), pessimistic 
detection efficiency (black with circles line), assuming e=0.4 for all 
orbits (black with triangles line), original KIC stellar parameters 
(yellow line), alternative DV Rp (yellow with circles line), low reli¬ 
ability planet candidate sample (blue line), high reliability planet 
candidate sample (blue with circles line), and assuming a single 
planet search (red line). 

2 

1 

0 


-2 

-3 

'"^0 0.5 1 1.5 2 

Planet Occurrence; 

0.75<R <2.5 R^ ; 50<P ^,<300 day 
p © orb * 

Fig. 11. — Samples from the posterior distribution of Fq as a 
function Qavg for the baseline results (orange points) along with 
an approximate 2-cr error ellipse for the baseline results (orange 
ellipse). Also shown are 2-cr error ellipses for the alternative inputs 
with the same line types as in Figure FTol 

lower planet occurrence rate (black with triangles line in 
Figure [Toll. Thus, for this parameter space the system¬ 
atic effect due to orbital eccentricity is comparable to the 
statistical errors. The impact of eccentricity on Oavg and 
fi is also modest. 

6 . 2 . 3 . Stellar Parameter Systematics 

Stellar parameter estimates of Kepler targets are 
subject to systematic uncertainties (iBrown et al.l 


Baseline 

Baseline 2-c ellipse 
Optimistic Efficiency 
Pessimistic Efficiency 
Eccentric 
Original KIC 
DVRp 

Low Reliability 
High Reliability 
First Planet Only 



Fig. 12.— Same as Figure fTTl but showing the posterior distri¬ 
bution of Fq as a function /3. 


2011 

IMann et al.l 

. l2012t 

Pinsonneault et al.l 

[20121: 

201.31 

[Everett et al.l 

[20131: 


iMuirhead et al.l 120121: 

[Dressing &: Charbonneaul 


and multiplici t y/ble n d effects ([Cartier et al.l 1201 


iLillo-Box et al.l l20l4 iCiardi et al.l 1201511 . Als o, the 
stellar parameter catalog of iHuber et al l (|2014f) relies 
upon a heterogeneous compilation of input sources and 
still has some limitations (see their Section 8 for a 
discussion). As a proxy for constraining the impact 
on occurrence rate studies due to stellar parameter 
systematics, we repeat the analysi s but adopt stella r 
parameters from the original KIC (|Brown et al.ll201lll . 
Using the original KIC is also germane since it was 
employed for previous work on planet occurrence r ates 
with Kepler ((Howard et al.l[2(iTa iFressin et al.ll2013i l. 

We redo the GK dwarf target selection resulting in 
102,186 targets that meet the selection criteria. There 
are 83,724 targets (91.4%) in common with the baseline 
GK dwarf sample discussed in Section [S] Table [T] pro¬ 
vides a binary flag to indicate that the Kepler target was 
included in the stellar sample based upon the original 
KIC stellar parameters. In the original KIC GK dwarf 
sample there are 177 planet candidates that have 122 
planet candidates (78.2%) in common with the baseline 
planet candidate sample discussed in Section |4l Table [3] 
has a binary flag indicating the planet candidates se¬ 
lected for this original KIC stellar sample. We adjust 
the derived R-p of the planet candidate sample by lin¬ 
early scaling Rp by the ratio in i?* between the baseline 
and the original KIC values. 

The net impact of the alternative stellar parameters 
results in ^ 2cr higher occurrence rates (yellow line in 
Figure ITOl). The change in Oaw is larger than for /3 (yel¬ 
low ellipse in Figures [TT] and [12]). 


6 . 2 . 4 . Planet Candidate Parameters 


Planet radii are not a direct observable, and they must 
be derived through parameter fits to light curves with 
various assumptions as to the stellar parameters, limb 
darkening coefficients, flux time series detrending, treat¬ 
ment of instrumental effects, orbital eccentricity, and 
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third light contamination to name a few 

Mandel & Agol 

20021 ISeager & Mallen-OrnelasI 120031: 

Gimened 

2006; 

Holman et al.ll2006l:IBurke et al.ll2007l: [Torres et al.l 

2008; 

SonthworthI 20111: iKippingl 2014 IRowe et al.l 

2015; 

Mnllallv et al.l 120151: iCiardi et al.l 1201511. In our cur- 


rent analysis, we treat Rp as perfectly known without 
uncertainty. Recent work has pointed out the non- 
negligible bias in deriving planet occurrence rates with¬ 
out taking into account the full error distribution of 


(IMorton & SwiftI 120141 IForeman-Mackev et al. 

[2014 

ISilburt et al.1120151: [Dressing & Charbonneaull2015 

1. De- 


tailed planet parameter posterior estimates have only 
recently become availabl e for a majority o f the Kepler 
planet candidate sample (iRowe et al.ll2015h . thus we de¬ 
fer occurrence rate analysis using a full posterior distri¬ 
bution of planet radii for future work. 

We repeat the occurrence rate calculation using the al¬ 
ternative Rp estimates provided by the Data VaUdatio n 
(DV) module of the Kepler pipeline (|Wu et al.l 1201011 . 
The most important differences between t he DV analysis 
and th e b aseline plane t para meters from iMullallv et ahl 
(|2015f) and I Rowe et al.l (|2014n are the independent meth¬ 
ods of detrending the flux time series data and DV use of 
minimization instead of a MCMC analysis. Both anal¬ 
yses assume the same stellar parameters, fixed limb dark¬ 
ening coefficients, zero eccentricity, and begin with the 
pre-search data conditi o ning time serie s llStumpe et al.1 
I2OI2I : [Smith et al]l2012| l. IMullallv et al.l (j2015H find that 
the radii ratios, from the MCMC analysis are 

'^7% smaller than from the analysis in DV. The typi¬ 
cally larger Rp from DV results in ~ 2.2tT lower occur¬ 
rence rates (yellow with circles line in Figure ITUl) . The 
change in Oavg is larger than for /3 (yellow with circles 
ellipse in Figures ITT] and IT^ . 

6.2.5. Planet Candidate Sample 

Characterizing a detection by the Kepler pipeline as a 
bona fide member of the Kepler planet candidate sam¬ 
ple has increasi ngly relied upon an au t omated classifica¬ 
tion procedure (iMcCanliff et al.ll2ni4 IRowe et al.ll2ni5l: 
Mnllallv et al.1 120151 : iCoughlin et al.l 120151 : IRowe et al.1 
201511 . However, the accuracy, efficacy, and impact on de¬ 
riving planet occurrence rates due to the automated clas¬ 
sification and remaining manual vetting decision steps 
have not been fully quantified. The vetting process has 
its own false negative alarm rate outside of the pipeline 
completeness, that currently we do not account for. In 
addition, the decision process for both the automated 
and manual decision methods becomes increasingly less 
definitive towards low SNR (see the discussion of the 
cu rrent planet sample limitations in Sections 7 and 9.1 
of [Mullally et al. [201^. The planet candidate catalog 
of IMullallv et al.l (|2ni5h takes an ‘innocent until proven 
guilty’ approach to deal with the indeterminant diag¬ 
nostics in the low SNR regime. The instrumental aper¬ 
ture contamination and crosstalk also become increas¬ 
ingly difficult to identify at low SNR (|Coughlin et al.1 
12014) . We constrain the potential systematics in de¬ 
riving planet occurrence rates due to uncertainty in the 
planet candidate sample classification process by consid¬ 
ering two alternative planet samples. 

First, we include a population of twelve ‘lower re¬ 
liability’ KOls with a false positive disposition in the 
50<Porb<300 days and i?p<2.5 R^ parameter space un- 


TABLE 6 
Low 

Reliability 
KOI False 
Positive 
Sample 

KOI number 

4954.01 

5043.01 

5081.01 

5102.01 

5123.01 

5177.01 

5198.01 

5210.01 

5257.01 

5309.01 

5325.01 

5405.01 


der investigation (see Table El). This sample of ‘lower 
reliability’ KOIs were characterized as pla net candidates 
for all the vetting procedures described in IMullallv et al.1 
(|2015f) except one. These KOIs are false positives be¬ 
cause they failed to maintain an SNR> 7.1 in the in¬ 
dependent detrending employed for the MCMC planet 
parameter estimates (IRowe et al.l 12014( 1. Prior to the 
MCMC evaluation, a trial minimization provides a pa¬ 
rameter initialization. These lower reliability KOIs failed 
to yield SNR> 7.1 in this trial fit and were therefore de¬ 
moted from a PLANET CANDIDATE to a FALSE POS¬ 
ITIVE disposition. Requiring an independent recovery 
of a potential detection is a valuable criteria for a planet 
candidate, but it largely impacts our lowest SNR detec¬ 
tions and we have not fully quantified the false negative 
rate of this independent recovery test. In lieu of a more 
detailed investigation, it provides a useful limiting test 
case sample to constrain the potential breakdown of the 
vetting metrics at the lowest SNR of the planet candidate 
sample. Including a lower reliability KOI sample in the 
planet candidate list, results in ^ Icr higher occurrence 
rates (blue line in Fignre fTOll and modest changes in Oavg 
and /3 (blue ellipse in Figures |TT] and |T2]). 

Second, we cull the baseline KOI planet candidate sam¬ 
ple to the most reliable detections by requiring KOIs to 
have been detected in at least one other pipeline run. 
Each run of the Kepler pipeline is independent and has 
different amounts and versions of the data. To remain in 
the ‘high reliability’ planet candidate sample, we require 
a KOI to be represented as a TCE in either the Q1-Q1 2 
pipeline run ([Tenenbaum et al1l2013l: IRowe et al]l2015ll . 
the Q1-Q17 pipeline run ([Seader et al.l 1201511 . or a test¬ 
ing/development run using Q1-Q17 data with a near- 
hnal Kepler pipeline code version. This requirement re¬ 
moved 26 KOI planet candidates in the 50<Porb<300 
days and i?p£2.5 R^ parameter space under investiga¬ 
tion. Table [3] contains a binary flag indicating the planet 
candidates belonging to this high reliability planet sam¬ 
ple. Adopting a higher reliability KOI sample results in 
~ 2.2cr lower occurrence rates (blue with circles line in 
Figure [Toll . The change in Oavg and /3 (blue with circles 
ellipse in Figures ITT] and [T^ are consistent with prefer¬ 
entially removing the lower SNR KOIs which typically 
reside at smaller radii and longer orbital periods. 

The final systematic we investigate is the impact of 
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limiting the search to a single planet per target, effec¬ 
tively ignoring the multiple planet search in the Kepler 
pipeline. We provide this result in order to more di¬ 
rectly compare independent analyses of t he Keyler data 
that d o not search for multiple planets (IPetigura et al.l 
l2018bll . IPetigura et al.l (j2013b[ l estimate that their oc¬ 
currence rates would be ~25% higher by including mul¬ 
tiple planet systems in their study. We concur with their 
estimate by finding a 25% (~ 2.2cr) lower occurrence rate 
by only including the lowest numbered KOI (typically the 
highest SNR) of a system (red line in Figure [T0)l . The 
change in Oavg is negligible and P prefers a more grad¬ 
ual decrease in planet occurrence with Porb despite the 
lower overall occurrence rate normalization (red ellipse 
in F igures [TT] and [TH) . 

6.2.6. Systematic Error Summary 

The previous sections show that individual systematic 
effects can reach 2a biases in the occurrence rates, where 
a is determined from statistical errors alone. Unfortu¬ 
nately, multiple systematic effects can add coherently 
rather than quadratically (see Section [7]). To provide a 
more realistic uncertainty in the context of all these sys¬ 
tematic uncertainties, we express the uncertainties on the 
occurrence rate parameters as an acceptable range. We 
adopt the lower and upper limit of the acceptable range 
as the 2a lower and upper limit for the largest systematic 
effect calculated in the previous sections. Based upon the 
results in this section, we find that the planet occurrence 
rate for the 1.0<i?p<2.0 i?® and 50<Porb<200 days pa¬ 
rameter space to have a best value from the baseline 
calculation of Fi = 0.34 with an acceptable range of 
0.19< Fi <0.7. We provide acceptable ranges for the 
PLDF model parameters in Table [SJ 

7. DISCUSSION 

In this section, we compare our PLDF to previous work 
on the Kepler target sample that included analysis of the 
G dwarf targets using at least twelve quarters of Kepler 
data. We generally find higher occurrence rates, no evi¬ 
dence for a break at Rp^2.5 increasing planet occur¬ 
rence rates towards i?p=I.O Rq, slightly shallower drop¬ 
off of occurrence rates towards longer Porb, and larger 
uncertainty on occurrence rates driven by systematic ef¬ 
fects. 

As a primary source for comparison, we compare to the 
in dependent pipel i ne anal ysis on planet occurrence rates 
bv IPetigura et al.l (|20I3hH . We compare the integrated 
planet occurrence rate over the I.0<Pp<2.0 P® and 
50<Porb<200 days r ange, Pi , in Figure Hffl We approxi¬ 
mate the result from IPetigura et ^ (l20I3b( l (Pi=9±3% 
occurrence rate) as a Gaussian (black line) with value 
and uncertainty as published from their Figure 2. The 
posterior distribution of our baseline result (orange his¬ 
togram) demonstr ates a significant differ ence from the 
occurrence rate of IPetigura et al.l (|20I3bH . For consis¬ 
tency with the TERRA pipeline (which does not search 
for multiple planets), we show our alternative occurrence 
rate after keeping only the highest SNR planet candidate 
for a target (red line) in Figure [131 The ‘first planet 
only’ occurrence rate does not fully remove the differ¬ 
ence. In our analysis we explored numerous alternative 
inputs (see Section [6.21) . Even when assuming a wide va¬ 
riety of systematics, we have a difficult time reconciling 



Fig. 13. — Comparison of the integrated posterior distribu¬ 
tion from our baseline PLDF over the 1.0</?p<2.0 and 
50<Porb<200 days parameter space (orange filled histogram) to 
previou s results over the s ame parameter space f rom IPetigura et "al] 
j2013bl'l (black curve) and IMulders et aTI l|2Q15l'l (blue curve). The 
posterior distribution from the other works are approximated as 
Gaussians with their central and standard deviation parameters 
as published. We also show our single planet search results (red 
line) and an extreme scenario where the four leading systematics 
resulting in lower occurrence rates are combined (red with circles 
line, see Section [7|). 

our results with IPetigura et gill (j20I3bl l. 

It is possible that several sources of systematics add 
coherently to reconcile the result s. For instance , we can 
reproduce the occurrence rate of iPetigura et al.l (120131)1) 
(black line in Figure IT^ by combining together four of 
the systematic effects resulting in lower occurrence rates: 
single planet search only, alternative DV Rp, highest re¬ 
liability KOIs, and optimistic detection efficiency. The 
resulting occurrence rate (red with circles line in Fig¬ 
ure dsi) Fi = 0.09 is less than our lower limit of an 
acceptable range Fi > 0.19. Further work is needed 
to understand the differences betwee n our results and 
the results of IPetigura et al.l (j20I3b[ l. Some possibili¬ 
ties including increasing the number of injection and re¬ 
covery trials, characterizing the impact of the flux time 
series detrending on planet recovery, investigating sys¬ 
tematic differences between the stellar parameter esti¬ 
mates of the planet candidate hosts and non planet can¬ 
didate hosts, and better characterization of the biases 
that may be prese nt when using the bin ned occurrence 
rate methodology (iMorton fc SwiftI 1201 411 . For instance, 
iForeman-Mackev et al.l (l20I4f) note a bias in the binned 
occurrence rate methodology is present if the complete¬ 
ness function is evaluated at exactly the location of the 
planet parameters rather than being averaged over the 
entire bin of analysis. 

One characteristic result of IPetigura et al.l l)20I3bD is 
a plateau to declining occurrence rates in the mini- 
Neptune to terrestrial planet regime. To enable com¬ 
parison of the results from our parametric PLDF model, 
we derive an approximate PLD F consistent with the oc - 
currence rate from Figure 3 of IPetigura et al.l (l20I3b[) . 
marginalized over 5<Porb<100 days. We determine a 
PLDF with two free parameters, Pp = using 
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the l<J ?n<l-4 Rm and 2<i?r> <2.8 bins from Fig¬ 
ure 3 of iPetigura et al.l (|2013b[) . yielding = —0.3677 
and F-p = 0.103. The approximating PLDF yields an 
occurrence rate of 14.9% for the 1.4<i?p<2.0 i ?0 bin 
compared to the pub lished value of 14.2±1.0% from 
IPetigura et al.1 ()2013bf ). As a further check, the approx¬ 
imating PLDF yields a Fi=12.5% occurrence rate com¬ 
pared to Fi=9±3% for the pu blished 1.0<i7n<2.0 
and 50<Porb<200 days rate of IPetigura et all (l2013bll . 
The derived ap is ~ 2-(7 (statistical uncertainty alone) 
shallower than our Oavg = —1.5 power law dependence 
of the occurrence rates on Rp. However, ap is consistent 
with our results for Oavg if one considers the full system¬ 
atic range -3.25< O avg <0.53 allowed from the sensitivity 
analysis of Section 16.21 A similar comparison applies to 
Porb dependence of the PLDF for the /? parameter. Our 
P = — 0.68 is ~ 2-(7 (statistical uncertainty alone) shal- 
lo wer than the Pp = — 1 dependence qualitatively stated 
in IPetigura et al.l (|2013bll . However within the full range 
allowed, -1.4< /3 <-0.1, the two values agree. We show 
the approximating PLDF (dash dot line) for comparison 
to our result in Figure |S1^° 

If we use our baseline inputs to fit the broken pow- 
erlaw in Pp PLDF over a larger, 0.75<Pp<5.0 P®, pa¬ 
rameter space, we do find decisive evidence, BIC>10, 
for the broken power law model over a single powerlaw 
in Pp PLDF. The derived Pbrk = 3.3^04 Pe, with a 
ai = —1.72 ± 0.3 power law dependence for Pp< Pbrk 
and a 2 = — 6 . 6 ± 1.7. The planet occurrence rate derived 
from this study is consistent with a power law break, 
but we find that it qualitatively o c curs at a larger radius 
than the study of IPetigura et al.l (|2013all (Pbrk ^ 2.5), 
but is consi stent with the qual itatively stated break at 
Pbrk 3 of iDong fc ZkJ (l 2 (^ . 

We also compare to the integr ated occurrence rate 
from the Kepler G dwarf sample of iMulders et al.l 1)20151 1 
(blue li ne) in Figure 1131 To estimate a value from Ta¬ 
ble 7 of iMulders et al.l()2015H . the 150<Porb<250 bin was 
weighted by 0.56 assuming a Porb”^ PLDF dependence 
across th e bin. This oc c urrenc e rate is in between the 
results of iPetigura et al.l l)2013bll and this study, and has 
uncertainty overlap with both studies especially when 
considering the systematic s ources of error . We f ind a 
very similar result between iMulders et al.l (|2015ll and 
iDong fc Zhul (1201311 for the o ccurre nce rate in this pa¬ 
rameter sgane^ SilbuHie^Qj 201 ^ find results compa¬ 
rable to lF*etigura et al.l '( 2013b l. 


8. EXTRAPOLATION TO LONGER PERIODS 

In this section, we compare the observed Ql- 
Q16 planet candidate sample at longer periods 
(300<Porb<700 days) to the predicted planet candi¬ 
date yield deduced by extrapolating the PLDF model 
with parameters determined from the shorter period 
(50<Porb<300 days) parameter space. In our baseline 

The rising slope toward smaller Rp of the approximating 
PLDF model from the IPetigura et al.l Il2013bl l results is visually 
in consistent with t he dec reasing occurrence rate shown in Figure 3 
of iPetigura et aLl Il2013bl l. but the visual inconsistency arises due 
to our adoption of linear bin widths for this study a nd the adoption 
of logarithmic bin widths of iPetigura et aP Il2ni3bll . Thus, a = 0.0 
corresponds to a flat occurrence rate in the linear bin widths of 
this study and a = —1.0 would correspond to a fiat occurrence if 
we were to adopt logarithmic bin widths. 
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Fig. 14.— Average pipeline completeness contours for the GK 
dwarf sample toward longer, 300<Porb<700 days, along with the 
Q1-Q16 Kepler planet candidate sample (orange points). 

study, we purposely avoided the longer period regime 
because the planet candidate sample with three transit 
events and low MES has the potential for a substantially 
hi gher false alarrn rate (see the discussion in Section 9.1 
of iMullallv et al.ll2015D . In previous planet candidate 
samples, the false alarm rate was minimal since a KOI 
detection from an earlier pipeline run could be compared 
to a later pipeline run with substantially more data avail¬ 
able. With the ending of the Kepler primary mission, 
further data beyond Q1-Q17 is not available to verify 
our lowest MES detections. 

EigureHHshows the average pipeline completeness con¬ 
tours toward longer Porb for the GK dwarf sample of this 
study along with the Kepler planet candidate sample in 
this regime. Using this long period pipeline complete¬ 
ness model and the shorter period PLDF model, we pre¬ 
dict the expected planet candidate yield for Kepler. The 
top panel of Figure [15] shows the difference between ob¬ 
served and predicted planet candidate counts marginal¬ 
ized over 300<Porb<700 days. There is a statistically 
significant overabundance of planet candidates toward 
longer periods than predicted from the baseline PLDF 
derived at the shorter orbital periods. The largest dis¬ 
crepancy is for the smallest Rp bin under consideration 
in Figure HSl The bottom panel of Figure |T5] shows the 
observed minus predicted planet candidate counts as a 
function of Porb after marginalizing over 0.75<Pp<2.5 
Rq . The most signiheant overabundance is in the middle 
Porb bin. The largest contributor to the overabundance 
are the cluster of hve planet candidates around Pp^l.l 
Rq and Porb~500 days that fall along the ( 0 . 01 ) average 
pipeline completeness contour level. 

The significant overabundance of planet candidates 
implies that extrapolations of our PLDF from the 
50<Porb<300 days range may underestimate the planet 
occurrence rates toward longer periods. However, at 
this time we cannot distinguish between a higher occur¬ 
rence of planets toward long periods in the Kepler GK 
dwarf planets, a larger false alarm contribution among 
the lowest MES planet candidates, or systematic bias 
in our simplified pipeline completeness model. We are 
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Fig. 15. — Top: marginalized over periods of 300<Porb<700 days 
observed Kepler planet candidate counts minus the predicted 
planet candidate counts obtained by extrapolating our planet oc¬ 
currence rate results from the shorter 50<Porb‘<300 days analysis 
of Section 0 Bottom: same as top, but marginalized over planet 
radius 0.75<Pp<2.5 Pq. 


Fig. 16.— Distribution for the 0.6 yr terrestrial planet occur¬ 
rence rate, C 0.65 integrated within 20 % of i?p=l and Porb$? 
using the baseline analysis (filled orange histogram). Solid lines 
represent re sult s using alternative inputs with the same line types 
as in Figure [To] 

“ the Venus zone (r] q) concept of [Kane et all (j2014D or the 
HZ (nm) concept feasting et al.lll99.'ll: iSelsis et al.ll2?)(y^ 

iZsom et 'mi201,‘H iKopparapn et al.ll201,‘H) . Stellar insola- 

tion flux is an indirectly measured quantity and 775 and 
770 depend upon uncertain theoretical models for terres- 
- trial planet atmospheric evolution. Providing occurrence 
rates in terms of Porb facilitates comparison with future 
Kepler occurrence rate studies and is readily compared 
to theoretical terrestrial planet formation models. 

We defer the additional complications in calculating 
77 ^ and 770 to future work. Despite the complications, 
800for G dwarfs, Co. 6 is a subset of the full 77 ^ parameter 
space, thus Co. 6 places a valuable lower limit on 77 ^ for 
G dwarfs. For the K dwarfs, Porb 9 corresponds to the 
Sun-Earth insolation flux. Thus, Co.6 is a lower limit on 
the K dwarf 779 . We find Co. 6 = 0.075 with an acceptable 
range of 0.013< Co.6 <0.30, and show the baseline and 
systematic posterior distributions for Co. 6 in Figure [TH 


investigating flux time series inversion and permutation 
tests along with a bootstrap noise characterization test 
(|Seader et al.ll2015ll in order to calibrate the false alarm 
rate in the Kepler planet candidate sample. 

9. TERRESTRIAL PLANET OCCURRENCE RATE FOR 
VENUS ORBITAL PERIODS 

Earth’s sister planet, Venus, has an orbital period 
within the Porb range of the baseline analysis. Thus, 
in this section we present results for the occurrence rate 
of terrestrial planets corresponding to the Porb~ 0.6 yr of 
Venus. We define Co.6 as the 0.6 yr terrestrial planet 
occurrence rate, which we take to be within 20 % of 
Pp=l P 0 and 20% of Porb^- The integral range of 20% is 
within t he expectations for the regime of rocky te rrestrial 
planets (|Rogersl 120151 : iWolfgang fc Looe^ l2014f) . Since 
Porb is a direct observable, providing occurrence rates in 
terms of Porb such as C; has advantages over providing 
occurrence rates in terms of stellar insolation flux, such as 


10. TERRESTRIAL PLANET OCCURRENCE RATE FOR 
ONE YEAR ORBITAL PERIODS 

10 . 1 . Extrapolating to One Year Orbital Period 

The longer, 300<Porb<700 days parameter space 
roughly coincides with the theoretical HZ for the G dwarf 
targets, which is a preferred location in a planetary sys¬ 
tem for a stable water bearing planet feasting et al .1 
1 1993(1 . In Section [H we demonstrated that determin¬ 
ing the planet occurrence rate in the 300<Porb<700 days 
range directly from Kepler data is at a premature stage 
due to significant false alarm contamination. In this sec¬ 
tion, we extrapolate our PLDF parametric model in or¬ 
der to calculate two occurrence rate parameters that can 
be used as a baseline for comparison to future work on 
rehning HZ occurrence rates. 

First, we measure the PLDF eyaluated at 1.0 Pm 
and Pnrb=365.25 days, F^ = dN/d lnPnrhdlnRr, ((Youdinl 
I2011I iForeman-Mackey et al.l[20i4f) . In the top panel of 
Figure[T7l we show the baseline (filled orange histogram) 
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r 0 determined by extrapolating the PLDF models from 
the 50<Porb<300 days results. We also show the alter¬ 
native systematic effects discussed in Section 16.21 (solid 
curves). Note that the logarithmic scaling for the ab¬ 
scissa indicates substantial systematic uncertainty in the 
results due to the extrapolation. We also show results 
for an extrapolated one year terrestrial planet occur¬ 
rence rate, ^i.q, defined as the occurrence rate of a planet 
within 20% of the Earth’s radius and Porb in the bottom 
panel of Figure [T71 for the baseline (filled orange his¬ 
togram) and alternative systematic effects discussed in 
Section 1121 (solid curves). For clarity the effects of ec¬ 
centricity and for ‘first planet only’ are not displayed in 
Figure ITTI as the results are within the statistical uncer¬ 
tainty of the extrapolated baseline result. 

10 .2. Directly Measured At One Year Orbital Period 

Though the level of systematics present in our analy¬ 
sis are substantial towards longer periods, we repeat the 
PLDF parameter estimation in the 0.75<i?p<2.5 Rq and 
300<Porb<700 days parameter space. We show the aver¬ 
age pipeline completeness for the long period parameter 
space in Figur e [HI The plan e t can didates from the Ql- 
Q16 catalog of iMullallv et al.l (|2015D are shown as orange 
points and are indicated by a binary flag in Table |3l The 
analysis yields a high Pq = 4 . 7 ±f ;77 planets per star, 
significantly steeper Oavg = —4.02 ± 0.8 and shallower 
/3 = 0.92 ±0.8, where the errors are the statistical uncer¬ 
tainty alone. 

We defer a more detailed analysis of the systematics 
to future work, but as a first step we consider culling 
the planet candidate sample of the five planet candi¬ 
dates along the 0.01 pipeline completeness contour shown 
in Figure [HI As discussed in Section 13 this cluster of 
five planet candidates represents a significant overabun¬ 
dance of planet candidates relative to our shorter pe¬ 
riod analysis. The overabundance relative to the shorter 
period extrapolation prediction is nearly erased (1.5-cr 
overabundance), if the cluster of five planet candidates 
is removed from the sample. The KOIs belonging to 
the trimmed long period planet candidate sample are 
indicated by a binary flag in Table |3 The PLDF pa¬ 
rameter estimation after removing these five planet can¬ 
didates yields Pq = l-7±o;6, aavg = —2.7 ± 1.1, and 
P = 0.4 ± 0.8. From this direct analysis we show the 
one year terrestrial planet occurrence rate in the bottom 
panel of FigurejT^for Ci.o = 0-76 ±q( orange dash line) 
and Ci.o = 0 . 21 ±q ;25 (black dash line), evaluated using 
the full and clipped Kepler planet candidate sample in 
the 300<Porb<700 days parameter space, respectively. 

10.3. One Year Terrestrial Planet Occurrence Rate 

Summary 

The wide range of occurrence rates obtained from this 
study is a consequence of the difficulties associated with 
extrapolating, small number statistics, and systematics 
(including false alarm reliabilities). This will impact re¬ 
fining ^ 1.0 and HZ statistics in future studies of the Ke¬ 
pler data set. Compiling our results of the extrapolated 
and direct analyses, we find Ci.o = 0.1 with an allowed 
range of 0.01< Ci.o <2. Dynamical s imulations cannot 
rule out an upper limit of Ci.o ±2 ([Smith fc Lissarieil 
|2009|1 . The mutual hill radii separation for a system 


0. 01 0.03 0.10 0.32 1.00 3.16 10.00 

— Optimistic Efficiency - - - Direct Baseiine 

— Pessimistic Efficiency - - - Trimmed Direct Baseiine 
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Fig. 17. — Top: distribution for the PLDF evaluated at the 
Rp and Porb of Earth, r 0 , using the extrapolated baseline analy¬ 
sis (filled orange histogram). Solid curves represent resu lts u sing 
alternative inputs with the same line types as in Figure [ini We 
also show two alternative analyses that directly measure, with¬ 
out extrapolation over Porbi T© from the Q1-Q16 Kepler planet 
candidate sample. The direct measurement of Fq using the full 
long period planet candidate sample (orange dash curve) and the 
trimmed long period planet candidate sample (black dash curve) 
result in higher F^ than the extrapolated PLDF results, respec¬ 
tively . Previous Fm d e termin a tions from |Foreman::Mackey_eL_alJ 
1201^, IPetigura et al.l Il2ni.3bl i. IDong Kr, Zhul ll2ni.'ll L and lYoudinl 
11201 II ) are shown with vertical lines as labeled. Bottom: same as 
top, but for the one year terrestrial planet occurrence rate, ^i.p. 


of three Mp=l planets within the Ci.o occurrence 
region of a G dwarf is >9 corre sponding to ~ 10^° yr 
stability ([Smith fc Lissau^l2009f) . However, for a lower 
mass K dwarf host and larger (i?p= 1 . 2 i? 0 ) planets the 
mutual hill radii separation ^ 7 for a triple planet sys¬ 
tem in the Ci.o zone would likely be unstable on a 10 ® yr 
timescale. 

For the PLDF value at the Rp and Porb of Earth, 
we find F^ = 0.6 with an acceptable range from 
0.04< F 0 <11.5. For comparison with previous stud¬ 
ies, we show in the bottom panel of Figure [17] as ver- 
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tical l i nes estimates o f Fm fro m iForeman-Mack ev et alJ 
(I2014D [Petigura et al.l (|20i3bD . [E)ong fc Zhul (|2013D . and 
lYoudirJ (12011 ) from left to right, re spectively- In order 
to calculate results for F 0 from the iDong fc Zhd (120131 1 
study, we extrapolate their parametric power law model 
as given for the l<i?p<2 ana lysis from their Table 2. 
The c entral value for Fq from iForeman-Mackev et al.l 
(|2014ll is in tension with our analysis, but there is over¬ 
lap in the upper tail of their posterior with ou r lower 
limits. The analysis of lForemamMackev£t_a^ (j2014f l 
used the same inputs fromiPetigura et al.l (|2013hll . How¬ 
ever, IForeman-Mackev et al.l ()2014h determine that find¬ 
ing a stee per fall off of o c currenc e rates toward longer 
Porb than iPetigura et al.l l)2013bD and taking into ac¬ 
count uncertainty on planet ra dii lead to a systemat - 
ically lower value for Fq than IPetigura et al.l (j2013b[ l 
when starting from the same inputs. Further work is 
needed in order to isolate whet her the differences between 
IForeman-Mackev et al.l (|2014ll and our study results pre¬ 
dominately from differing inputs or methodology. The 
other results for Fq from the literature are consistent 
with our allowed range of Fq . 


11. CONCLUSION 


In this study we make use of the first Kepler pipeline 
run using nearly all (Q1-Q16) the extant Kepler data 
in order to measure the planet occurrence rate for 
0.75<i?p<2.5 i?©in the 50<Porb<300 days range or¬ 
biting the GK dwarf Kepler sample. We employ the 
first characterization of the Kepler pipeline detection 
efficiency calibrated with transit injection and recov¬ 
ery tests (|Christiansen et aDl2015f). the Q1-Q 16 Kepler 
planet caiididate catalog (|Mullallv et al.|l2015), and the 
KIC s tellar parameter catalog revision of iHuber et al.l 

(IWl . 


We fit the observed planet candidate sample us- 
ing a pmame tric PLDF model following the work of 
lYoudiiJ (|2011ll and explore alternative inputs into the 
calculation in order to study the systematic errors 
present. In general, we find higher occurrence rates 
for the mini-Neptune to terrestrial planet regime or¬ 
biting GK dwarfs and also larger uncertainties driven 


by the systematics than indicated by previous studies 
(iDong fc Zhul [ 20 II: iSilburt et al.l 120151 IPetigura et al.l 
l2013aibl : iMulders et al.l 1201511 . We determine that 
Fq = 0.77 planets per GK dwarf in the Kepler sam¬ 
ple have a planet within the 0.75<i?p<2.5 i?® and 
50<Porb<300 days regime with a systematic dominated 
allowable range of 0.28< Fq <1.9. The power law expo¬ 
nent for the Rp dependence in the PLDF model has a 
best value aavg=-1.5 indicating an increasing planet oc¬ 
currence towards small planets, but the allowed range, 
-3.25< Pavg <0.53, implies that we cannot definitively 
determine whether the occurrence increases or decreases 


towards the smallest planets. However, fitting a double 
power-law model over a wider range of 0.75<i?p<5.0 R^ 
does find conclusive evidence for a break in the occur¬ 
rence rate at i?brk = 3.3±to'4. 

We estimate a one year terrestrial planet occurrence 
rate, Ci.o = 0.1, with an acceptable range 0.01< Ci.o <2, 
by integrating within 20% of the Rp and Porb of Earth. 
The narrower Ci n parameter space i s a subset of the 
G dwarf HZ, 77 m fe asting et al.||19i ^ iSelsis et al.ll2?ifF^ 


iZsom et "m 1201,^ jKopparapu et al.l 120131 1. Thus, Ci.o 

places a lower limit on 77 ® for G dwarfs. Ci.o, which 
depends upon the direct observable Porb, facilitates com¬ 
parison with future Kepler occurrence rate studies and is 
readily compared to theoretical terrestrial planet forma¬ 
tion models. We defer estimates of 770 , which depends 
upon the indirect observable of stellar insolation and un¬ 
certain atmospheric evolution theory for terrestrial plan¬ 
ets outside the Solar System, to future studies. 

We also determine a 0.6 year terrestrial planet oc¬ 
currence rate, Co. 6=0.075, with an acceptable range 
0.013< Co. 6 <0.30, by integrating within 20 % of the 
Rp=l Rq and Porb 9 corresponding to Venus-type plan¬ 
ets for G dwarf hosts. For the K dwarfs of our sam¬ 
ple (Tefi<5200 K), Porb=0.6 yr roughly corresponds to 
the Solar-Earth insolation flux level. Thus, Co.6 places a 
lower limit on 770 for K dwarfs. 

Although the current results are dominated by system¬ 
atic uncertainties, which, unlike statistical uncertainties 
that are limited by the quantity and quality of data, can 
be minimized with additional study. Erom our analy¬ 
sis, we identify the leading sources of systematics: in¬ 
strumental false alarm contamination of the planet can¬ 
didate sample, determining planet radii (independent of 
the degeneracy with P*), pipeline completeness, and stel¬ 
lar parameters. Additional work on third light contami¬ 
nation, orbital eccentricity, astrophysical false positives, 
and false negative rate of the planet vetting process is 
needed. All of these should be examined carefully before 
accepting a definitive value for rj^. 
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